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Question 1 | [9 marks] 


Answer each of the following multiple choice questions by clearly writing the question 


number and part in your answer book followed by the letter corresponding to your answer. 
(For example: QI (i) A.) 


(i) From what kinds of variables would side-by-side boxplots be generated? 


A. categorical only 
B. quantitative only 
C. one categorical and one quantitative 


D. varies according to situation [1 Mark] 


(ii) Weight is a measure that tends to be normally distributed. Suppose the mean weight 


(iii) 


of all women at a large university is 67 kgs, with a standard deviation of 6 kgs. 
If you took a random sample of 36 university women from this population, there 
would be a 68% chance that the sample mean weight (%) would be between: 

A. 55 and 79 kes 

B. 61 and 73 kgs 

C. 64 and 70 kgs 

D. 66 and 68 kgs [1 Mark] 


In a survey, students are asked how many hours they study in a typical week. 
A five-number summary of the responses is: 2, 9, 14, 20, 60. 


Which interval describes the number of hours spent studying im a typical week for 
about 50% of the students sampled? 


A. 14 to 20 

B. 9 to 20 

C. 9 to 14 

D. 2 to9 [1 Mark] 


Question 1 is continued on page 3 
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Question 1 continued 


(iv) The correlation between two scores on tests was found to be exactly 1. Which of the 


following would not be true, regarding the corresponding scatterplot? 


A. Every point would lie along a perfect straight line, with no deviations at all. 
B. The slope of the best fitting line would be 1. 


C. The best fitting line would have a positive slope. /1 Mark/ 


(v) Which statistical procedure would most likely be used to answer the following 


research question: 


Is ethnicity related to political party affiliation (Labor, Liberal, Greens, Other)? 


Assume all assumptions for using the procedure have been met. 


A. Construet a Confidence Interval for the difference between two means (indepen- 


dent samples). 
B. Test for a difference in more than two means (one way ANOVA). 
C. ‘Test that a correlation coefficient is not equal to 0. 
D. Use a chi-squared test of association. /1 Mark/ 
(vi) Two random samples are selected from a large population of measurements. Some of 
the statistical measures listed will change from sample to sample and some will stay 


the same. Which of the following sets of measures is fixed and will not change from 


sample to sample? 


cS Co oe 
=a 
> 
= 


OG [1 Mark/ 


Question 1 is continued on page 4 
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Question 1 continued 


(vii) Which one of these statistics is unaffected by outliers? 
A. Range 
B. Standard deviation 
C. Mean 


D. Interquartile range [1 Mark 


(viii) Suppose that in a large population, the proportion that is left-handed is p = 0.10. 
Suppose n = 20 people will be randomly selected and X = number of people in the 
sample who are left-handed. What probability model should be used to find the 
probability that X = 3” 


A. Binomial 


B. Poisson 


C. Normal [1 Mark] 


(ix) In astudy on the effect of glucose on insulin release, 12 identical tissue specimens were 
divided into three groups (4 specimens per group). Three levels (low=1, medium=2 
and high=3) of glucose concentration (con) were randomly assigned to the three 
groups. The amounts of insulin released by the tissue samples were recorded. The 


data were then analysed using R. Output for the study is given below. 


Df Sum Sq Mean Sq F value Pr (>F) 


con 2 10.29 5.15 9.31 0.006 
Residuals 9 4.98 0.55 


What is the appropriate conclusion to draw from this analysis? 


A. Higher glucose concentrations result in lower insulin levels. 


B. There is not a significant difference among the mean insulin levels for the dif- 
ferent glucose levels. 

C. There is a significant diflerence among the mean insulin levels for the different 
glucose levels. 


D. Higher glucose concentrations produce higher insulin levels. [1 Mark] 
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Question 2 [7 marks] 
Improperly handled or undercooked poultry and eggs are the most frequent cause of 
salmonella food poisoning. Suppose that the mean number of reported cases per month in 


a certain state of Australia is 1.7. 


(a) Using correct notation give 


(i) the probability distribution that could be used to describe the distribution of 
cases per month in that state. [2 Marks] 


(ii) the standard deviation of the distribution. [1 Mark/ 


(b) Showing all calculations and using correct probability notation, find the probability 


that, for a month chosen at random, 


(i) exactly 2 cases were reported. [2 Marks/ 


(ii) more than 4 cases were reported. [2 Marks] 


You should refer to Figure 1, below. 


Cumulative Probability 


No. of cases 


Figure 1: 
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Question 3 : [5 marks] 


In a lab-based experiment (Cornwall et al. 2013, Proceedings of the Royal Society 
B), the growth rate of a macroalgae (A. corymbosa) to static versus fluctuating pH was 
measured. For the static pH treatment, the pH was kept constant at 8.05 for 40 days. For 
the fluctuating pH treatment, the average pH was also 8.05, but fluctuated by +0.40 units 
on a diurnal cycle for 40 days, mimicking the natural pH cycle of Otago Harbour, New 
Zealand. The primary outcome measure for the experiment was the relative growth rate 
(d~*), measured as a continuous variable at the end of the 40 day experiment. 

For the static pH treatment (n = 12), the average relative growth rate was ¥ = 0.0040 
with estimated standard deviation s, = 0.0030. For the fluctuating pH treatment (n = 
12), the average relative growth rate was 7 = 0.0025 with estimated standard deviation 
8, = 0.0027. The estimated difference of the means ju, — j4, was 0.0015 with 95% confidence 
interval (-0.0009, 0.0039). 


Answer the following: 


(a) The pooled standard error s, was used to calculate the 95% confidence interval. Why 


is use of the pooled standard error justified for this experiment? /1 mark/ 


(b) Describe two conditions that should be checked to justify the validity of t-based con- 


fidence intervals for relatively small samples (n < 30). [2 marks] 


(c) For future experiments, is it likely to matter which pH treatment is used when study- 
ing growth of A. corymbosa? Use both the estimated difference and confidence in- 


terval to answer this question. [2 marks] 
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Question 4 — [7 marks] 
Many patients undergoing treatment for substance abuse begin taking drugs again within 
12 months. In fact, for many treatment approaches, only 40% of patients undergoing 


treatment are drug-free 12 months after treatment, and treatments with higher rates of 


abstinence are considered to be successful. 

In a study of the effect of EEG biofeedback training on drug treatment (Scott et al. 
2005, The American Journal of Drug and Alchohol Abuse), 36 of 47 patients who underwent 
a 12-week in-patient treatment were drug-free 12 months after treatment. You may assume 
that the patients in the study are a representative random sample of the population of 


possible patients. 


(a) Estimate the proportion of patients who underwent EEG biofeedback training who 


were drug-free 12 months after treatment (p). [1 mark/ 
(b) Calculate a 95% confidence interval for p using z* = 1.96. [2 marks/ 


(c) Are the two conditions that allow use of the formula in (b) met? Justify your 


answer. [2 marks] 


(d) Carefully interpret the confidence interval from (b) to assess whether EEG 
biofeedback training is a successful treatment for substance abuse. 


[2 marks] 
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Question 5 [5 marks] 
In capture-mark-recapture (CMR) studies, researchers capture and mark animals over a 
number of occasions in order to study demographic features of the population. Animals 
that are captured may be captured multiple times or just once. Capture-reeapture studies 
generally assume that capture rates are the same for all animals, but this is not always the 


case. Some animals can become trap-dependent, either being ‘trap-happy’ or ‘trap-shy’. 


For example, if animals are lured with food, previously captured animals may learn to be 
recaptured, while if the capture procedure is unpleasant, previously captured animals may 
learn to avoid capture. 

A Chi-Squared test for trap-dependence assumes that whether an animal is missed 
or caught on occasion ¢t is independent from whether the animal is missed or caught on 
occasion t+ 1. If the null hypothesis of independenee is rejected, the interpretation is that 
there is evidence of trap-dependence. 

Consider the following table of observed captures for n = 155 animals, comparing cap- 


tures on the 3rd and 4th capture occasions. 


Missed @ t+1=4 Caught @ t+1=4 


Missed © t=3 60 30 


Caught © t=3 40 25 


(a) How many degrees of freedom are there for this Chi-Square Test? [1 mark/ 


(b) Calculate £;; for the number of animals that would be missed on both oceasions. 


(Recall, £;; = R;C;/n) [1 mark/ 

(c) Calculate ee for the number of animals that would be missed on both occasions. 
[1 mark 

(d) Given the p-value for this test is 0.63, write a conclusion. /2 marks] 
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Question 6 


[9 marks| 


An additive is given to the diet of 50 cows. The levels of the additive are 0, 0.1%, 0.2%, 
or 0.3%. It is hoped that the additive will increase the level of milk fat. 


The results of a linear regression of the % additive (percent.additive) versus milk fat 7% 


(milk.fat) are shown below (you may assume all regression assumptions have been met): 


Cail: 


Im(formula = milk.fat ~ percent.additive, data = milk.study) 


Coefficients: 

Estimate Std. Error t value Pr(>|tl) 
(Intercept) Si2fo2 0.1038 31.546 < 2e-16 
percent.additive 2.0132 0.5582 3.607 0.000736 


Residual standard error: 0.4341 on 48 degrees of freedom 


Multiple R-squared: 0.2132, Adjusted R-squared: 0.1969 


F-statistic: 13.01 on 1 and 48 DF, p-value: 0.0007363 


Write down the regression equation relating % additive to milk fat %. 
Is there evidence that the additive increases the level of milk fat? 
Interpret the R? statistic. 


Predict the milk fat % when the additive level is 0.2%. 


[1 mark 
{1 mark] 
[1 mark] 


[1 mark 


One of the cows receiving 0.2% additive in her feed had a milk fat level of 4.29%. 


Calculate the residual for this cow. 


Calculate the 95% confidence interval for the slope. Use t* = 2.01. 


[1 mark] 


[2 marks] 


Use the estimated regression equation to predict how much additive would be required 


to achieve, on average, 5% milk fat. 


Explain why the prediction from (g) might be misleading. 


[1 mark] 


[1 mark 


Question 7 
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[8 marks] 


Gaseous emissions from volcanoes can have a significant impact on ecosystems. For 


example, volcanic CO2 vents can lower the pH of the oceanic water column. CO: emissions 


were measured at four different volcanic vents. An analysis of variance was conducted to 


compare the mean emissions for the four vents. 


(a) State the null hypothesis (Hp) and the alternative hypothesis (H,) for this study in 


terms of the four means. 


[2 marks] 


(b) Is there evidence that mean emissions vary among the four treatments? Explain 
using the ANOVA results in Table 1. 


‘Table 1: 


Analysis of Variance Table 


Response: CQ2 

Df Sum Sq Mean Sq F value Pr(>F) 
vent 3 612 204.1 8.56 0.00096 
Residuals 18 429 23.8 


[2 marks] 


(c) With reference to the numerical summary given in Table 2, verify that the 


95% confidence interval for the meai CO, emission from Vent 2 is (31.6, 40.0). 


Use t* = 2.1. 


[2 marks] 


cd) With reference to the 95% confidence intervals for the means given in Table 2, write 
} 


an. informative conclusion. 


mean sd 2.5% 97.5 % 
1 30.2 2.59 25.6 


230.0 3.ot 31.6 
O.O0e2 Osel 33.1 
4 24.9 6.36 2420 


10 


34.8 
40.0 
43.4 
28.7 


[2 marks] 
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Formulae are given on page 12 


Please remember: This examination question paper MUST BE HANDED IN. Failure 
to do so may result in the cancellation of all marks for this examination. Writing your 


name and number on the front will help us confirm that your paper has been returned. 
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Formulae 
P(A|B) = P(A) if A and B are independent 


P(A and B) = P(A)P(B) if A and B are independent 
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